Lecture 17 - Introduction to SQL
30 October, 2024
postgres. Please remember it, as you will need it to connect to the database5432 and the host is localhost (no need to change them)postgres and the default database is postgrespostgres with the right mouse button, then click on Connect Serverpostgres databaseDatabases, then postgres below it. After this, click with the right mouse button and go to Query ToolTABLE called drivers,) that contain the column_name and column_type, respectivelydriver_id is an int, the driver_name is a varchar with a maximum of 30 characters, the nationality is a varchar with a maximum of 15 characters, and the victories is an intdriver_id is the primary key, which means it is unique for each driverExecute (▶️) button to run the codeINSERT INTO commandVALUES to the bottom of the table;)INSERT INTO drivers VALUES (1, 'Lewis Hamilton','British', 103);
INSERT INTO drivers VALUES (4, 'Fernando Alonso', 'Spanish', 32);
INSERT INTO drivers VALUES (3, 'Sebastian Vettel', 'German', 91);
INSERT INTO drivers VALUES (2, 'Michael Schumacher', 'German', 53);Execute (▶️) button to run the codeData Output tab (bottom left) to see the results… but there’s nothing there! 😱:::
Data Output tab is empty because we need to run a SELECT command to see the data 😉SELECT command soon, but for now let’s just run the following command:*) from the drivers tableExecute (▶️) button to run the code (or press F5)drivers table on your owndriver_id for each new driver;)SQL Notebook icon on the left side of the screenNew SQL Connectionlocalhost (or any name you want)postgreslocalhost5432postgrespostgresCreate.sql with SQL Notebook17-introduction-sql.sql file in the repository for this lectureOpen With menu option (right button). Then, select the SQL Notebook formatSUM() or AVG()AS is an alias which assigns a name to the aggregate statisticSELECT SUM(victories) AS sum_victories,
COUNT(*) AS num_rows,
AVG(victories) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers;nationality and driver_id for drivers with more than 60 victoriesGROUP BY to group data by a columnSELECT nationality,
SUM(victories) AS sum_victories,
AVG(victories) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers
GROUP BY nationality;ROUND() to round the valuesSELECT nationality,
SUM(victories) AS sum_victories,
ROUND(AVG(victories), 1) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers
GROUP BY nationality;HAVING to filter variables after aggregatingWHERE, which is used to filter variables before aggregatingHAVING| SQL Command | Pandas Equivalent |
|---|---|
SELECT |
df[['column1', 'column2']] |
INSERT INTO |
df.loc[new_index] = new_row or df = df.append(new_row, ignore_index=True) |
WHERE |
df[df['column'] == value] |
GROUP BY |
df.groupby('column') |
HAVING |
df.groupby('column').filter(lambda x: condition) |
ROUND() |
df['column'].round(decimals) |
| Aggregate Functions | df.groupby('column').agg({'col1': 'sum', 'col2': 'mean', 'col3': 'min', 'col4': 'max'}) |